Skip to main content

Convex Optimization

Convex Functions

We begin with a definition of the graph of a real-valued function.

Definition

The graph of f:ΩR,ΩRnf: \Omega \rightarrow \mathbb{R}, \Omega \subset \mathbb{R}^n, is the set of points in Ω×RRn+1\Omega \times \mathbb{R} \subset \mathbb{R}^{n+1} given by

{[xf(x)]:xΩ}.\left\{\left[\begin{array}{c} \boldsymbol{x} \\ f(\boldsymbol{x}) \end{array}\right]: \boldsymbol{x} \in \Omega\right\} .

We can visualize the graph of ff as simply the set of points on a "plot" of f(x)f(\boldsymbol{x}) versus x\boldsymbol{x}.

Definition

The epigraph of a function f:ΩR,ΩRnf: \Omega \rightarrow \mathbb{R}, \Omega \subset \mathbb{R}^n, denoted epi (f)(f), is the set of points in Ω×R\Omega \times \mathbb{R} given by

epi(f)={[xβ]:xΩ,βR,βf(x)}.\operatorname{epi}(f)=\left\{\left[\begin{array}{l} \boldsymbol{x} \\ \beta \end{array}\right]: \boldsymbol{x} \in \Omega, \beta \in \mathbb{R}, \beta \geq f(\boldsymbol{x})\right\} .

The epigraph epi (f)(f) of a function ff is simply the set of points in Ω×R\Omega \times \mathbb{R} on or above the graph of ff. We can also think of epi (f)(f) as a subset of Rn+1\mathbb{R}^{n+1}.

Recall that a set ΩRn\Omega \subset \mathbb{R}^n is convex if for every x1,x2Ω\boldsymbol{x}_1, \boldsymbol{x}_2 \in \Omega and α(0,1)\alpha \in(0,1), αx1+(1α)x2Ω\alpha \boldsymbol{x}_1+(1-\alpha) \boldsymbol{x}_2 \in \Omega. We now introduce the notion of a convex function.

Definition

A function f:ΩR,ΩRnf: \Omega \rightarrow \mathbb{R}, \Omega \subset \mathbb{R}^n, is convex on Ω\Omega if its epigraph is a convex set.

Theorem

If a function f:ΩR,ΩRnf: \Omega \rightarrow \mathbb{R}, \Omega \subset \mathbb{R}^n, is convex on Ω\Omega, then Ω\Omega is a convex set.

Proof

We prove this theorem by contraposition. Suppose that Ω\Omega is not a convex set. Then, there exist two points y1\boldsymbol{y}_1 and y2\boldsymbol{y}_2 such that for some α\alpha \in (0,1)(0,1)

z=αy1+(1α)y2Ω.\boldsymbol{z}=\alpha \boldsymbol{y}_1+(1-\alpha) \boldsymbol{y}_2 \notin \Omega .

Let

β1=f(y1),β2=f(y2).\beta_1=f\left(\boldsymbol{y}_1\right), \quad \beta_2=f\left(\boldsymbol{y}_2\right) .

Then, the pairs

[y1β1],[y2β2]\left[\begin{array}{l} \boldsymbol{y}_1 \\ \beta_1 \end{array}\right], \quad\left[\begin{array}{l} \boldsymbol{y}_2 \\ \beta_2 \end{array}\right]

belong to the graph of ff, and hence also the epigraph of ff. Let

w=α[y1β1]+(1α)[y2β2]\boldsymbol{w}=\alpha\left[\begin{array}{l} \boldsymbol{y}_1 \\ \beta_1 \end{array}\right]+(1-\alpha)\left[\begin{array}{l} \boldsymbol{y}_2 \\ \beta_2 \end{array}\right]

We have

w=[zαβ1+(1α)β2]\boldsymbol{w}=\left[\begin{array}{c} \boldsymbol{z} \\ \alpha \beta_1+(1-\alpha) \beta_2 \end{array}\right]

But note that w\boldsymbol{w} \notin epi (f)(f), because zΩ\boldsymbol{z} \notin \Omega. Therefore, epi (f)(f) is not convex, and hence ff is not a convex function.

The next theorem gives a very useful characterization of convex functions. This characterization is often used as a definition for a convex function.

Theorem

A function f:ΩRf: \Omega \rightarrow \mathbb{R} defined on a convex set ΩRn\Omega \subset \mathbb{R}^n is convex if and only if for all x,yΩ\boldsymbol{x}, \boldsymbol{y} \in \Omega and all α(0,1)\alpha \in(0,1), we have

f(αx+(1α)y)αf(x)+(1α)f(y)f(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y}) \leq \alpha f(\boldsymbol{x})+(1-\alpha) f(\boldsymbol{y})

Proof

\Leftarrow : Assume that for all x,yΩ\boldsymbol{x}, \boldsymbol{y} \in \Omega and α(0,1)\alpha \in(0,1)

f(αx+(1α)y)αf(x)+(1α)f(y).f(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y}) \leq \alpha f(\boldsymbol{x})+(1-\alpha) f(\boldsymbol{y}) .

Let [x,a]\left[\boldsymbol{x}^{\top}, a\right]^{\top} and [y,b]\left[\boldsymbol{y}^{\top}, b\right]^{\top} be two points in epi (f)(f), where a,bRa, b \in \mathbb{R}. From the definition of epi (f)(f) it follows that

f(x)a,f(y)bf(\boldsymbol{x}) \leq a, \quad f(\boldsymbol{y}) \leq b

Therefore, using the first inequality above, we have

f(αx+(1α)y)αa+(1α)b.f(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y}) \leq \alpha a+(1-\alpha) b .

Because Ω\Omega is convex, αx+(1α)yΩ\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y} \in \Omega. Hence,

[αx+(1α)yαa+(1α)b]epi(f)\left[\begin{array}{c} \alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y} \\ \alpha a+(1-\alpha) b \end{array}\right] \in \operatorname{epi}(f)

which implies that epi (f)(f) is a convex set, and hence ff is a convex function. \Rightarrow : Assume that f:ΩRf: \Omega \rightarrow \mathbb{R} is a convex function. Let x,yΩ\boldsymbol{x}, \boldsymbol{y} \in \Omega and

f(x)=a,f(y)=b.f(\boldsymbol{x})=a, \quad f(\boldsymbol{y})=b .

Thus,

[xa],[yb]epi(f)\left[\begin{array}{l} \boldsymbol{x} \\ a \end{array}\right],\left[\begin{array}{l} \boldsymbol{y} \\ b \end{array}\right] \in \operatorname{epi}(f)

Because ff is a convex function, its epigraph is a convex subset of Rn+1\mathbb{R}^{n+1}. Therefore, for all α(0,1)\alpha \in(0,1), we have

α[xa]+(1α)[yb]=[αx+(1α)yαa+(1α)b]epi(f)\alpha\left[\begin{array}{l} \boldsymbol{x} \\ a \end{array}\right]+(1-\alpha)\left[\begin{array}{l} \boldsymbol{y} \\ b \end{array}\right]=\left[\begin{array}{c} \alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y} \\ \alpha a+(1-\alpha) b \end{array}\right] \in \operatorname{epi}(f)

The above implies that for all α(0,1)\alpha \in(0,1),

f(αx+(1α)y)αa+(1α)b=αf(x)+(1α)f(y)f(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y}) \leq \alpha a+(1-\alpha) b=\alpha f(\boldsymbol{x})+(1-\alpha) f(\boldsymbol{y})

This completes the proof. A geometric interpretation of the theorem is given below. The theorem states that if f:ΩRf: \Omega \rightarrow \mathbb{R} is a convex function over a convex set Ω\Omega, then for all x,yΩ\boldsymbol{x}, \boldsymbol{y} \in \Omega, the points on the line segment in Rn+1\mathbb{R}^{n+1} connecting [x,f(x)]\left[\boldsymbol{x}^{\top}, f(\boldsymbol{x})\right]^{\top} and [y,f(y)]\left[\boldsymbol{y}^{\top}, f(\boldsymbol{y})\right]^{\top} must lie on or above the graph of ff.

Using the above theorem, it is straightforward to show that any nonnegative scaling of a convex function is convex, and that the sum of convex functions is convex.

Theorem

Suppose that f,f1f, f_1, and f2f_2 are convex functions. Then, for any a0a \geq 0, the function af is convex. Moreover, f1+f2f_1+f_2 is convex.

Proof

Let x,yΩ\boldsymbol{x}, \boldsymbol{y} \in \Omega and α(0,1)\alpha \in(0,1). Fix a0a \geq 0. For convenience, write fˉ=af\bar{f}=a f. We have

fˉ(αx+(1α)y)=af(αx+(1α)y)a(αf(x)+(1α)f(y)) because f is convex and a0=α(af(x))+(1α)(af(y))=αfˉ(x)+(1α)fˉ(y)\begin{aligned} \bar{f}(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y}) & =a f(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y}) \\ & \leq a(\alpha f(\boldsymbol{x})+(1-\alpha) f(\boldsymbol{y})) \text { because } f \text { is convex and } a \geq 0 \\ & =\alpha(a f(\boldsymbol{x}))+(1-\alpha)(a f(\boldsymbol{y})) \\ & =\alpha \bar{f}(\boldsymbol{x})+(1-\alpha) \bar{f}(\boldsymbol{y}) \end{aligned}

which implies that fˉ\bar{f} is convex. Next, write f3=f1+f2f_3=f_1+f_2. We have

f3(αx+(1α)y)=f1(αx+(1α)y)+f2(αx+(1α)y)(αf1(x)+(1α)f1(y))+(αf2(x)+(1α)f2(y)) by convexity of f1 and f2=α(f1(x)+f2(x))+(1α)(f1(y)+f2(y))=αf3(x)+(1α)f3(y),\begin{aligned} f_3(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y})= & f_1(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y})+f_2(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y}) \\ \leq & \left(\alpha f_1(\boldsymbol{x})+(1-\alpha) f_1(\boldsymbol{y})\right)+\left(\alpha f_2(\boldsymbol{x})+(1-\alpha) f_2(\boldsymbol{y})\right) \\ & \quad \text { by convexity of } f_1 \text { and } f_2 \\ = & \alpha\left(f_1(\boldsymbol{x})+f_2(\boldsymbol{x})\right)+(1-\alpha)\left(f_1(\boldsymbol{y})+f_2(\boldsymbol{y})\right) \\ = & \alpha f_3(\boldsymbol{x})+(1-\alpha) f_3(\boldsymbol{y}), \end{aligned}

which implies that f3f_3 is convex. This theorem implies that for any given collection of convex functions f1,,ff_1, \ldots, f_{\ell} and nonnegative numbers c1,,cc_1, \ldots, c_{\ell}, the function c1f2++cfc_1 f_2+\cdots+c_{\ell} f_{\ell} is convex. It is similarly straightforward to show that the function max{f1,,f}\max \left\{f_1, \ldots, f_{\ell}\right\} is convex. We now define the notion of strict convexity.

Definition

A function f:ΩRf: \Omega \rightarrow \mathbb{R} on a convex set ΩRn\Omega \subset \mathbb{R}^n is strictly convex if for all x,yΩ,xy\boldsymbol{x}, \boldsymbol{y} \in \Omega, \boldsymbol{x} \neq \boldsymbol{y}, and α(0,1)\alpha \in(0,1), we have

f(αx+(1α)y)<αf(x)+(1α)f(y).f(\alpha \boldsymbol{x}+(1-\alpha) \boldsymbol{y})<\alpha f(\boldsymbol{x})+(1-\alpha) f(\boldsymbol{y}) .

From this definition, we see that for a strictly convex function, all points on the open line segment connecting the points [x,f(x)]\left[\boldsymbol{x}^{\top}, f(\boldsymbol{x})\right]^{\top} and [y,f(y)]\left[\boldsymbol{y}^{\top}, f(\boldsymbol{y})\right]^{\top} lie (strictly) above the graph of ff.

Definition

A function f:ΩRf: \Omega \rightarrow \mathbb{R} on a convex set ΩRn\Omega \subset \mathbb{R}^n is (strictly) concave if f-f is (strictly) convex.

Convex optimization problems in standard form

A convex optimization problem is one of the form

 minimize f0(x) subject to fi(x)0,i=1,,maiTx=bi,i=1,,p,\begin{array}{ll} \text { minimize } & f_0(x) \\ \text { subject to } & f_i(x) \leq 0, \quad i=1, \ldots, m \\ & a_i^T x=b_i, \quad i=1, \ldots, p, \end{array}

where f0,,fmf_0, \ldots, f_m are convex functions. The convex problem has three additional requirements:

  • the objective function must be convex,
  • the inequality constraint functions must be convex,
  • the equality constraint functions hi(x)=aiTxbih_i(x)=a_i^Tx-b_i must be affine.